Skip to content

Profile callprofiler with different testcases#829

Open
ThreeMonth03 wants to merge 1 commit into
solvcon:masterfrom
ThreeMonth03:profile-callprofiler-cases
Open

Profile callprofiler with different testcases#829
ThreeMonth03 wants to merge 1 commit into
solvcon:masterfrom
ThreeMonth03:profile-callprofiler-cases

Conversation

@ThreeMonth03
Copy link
Copy Markdown
Collaborator

@ThreeMonth03 ThreeMonth03 commented May 25, 2026

Introduction

Issue #831 implies that a slow callprofiler might make profiling error bigger. To monitor whether callprofiler works fast in every cases, this pull request profilers callprofiler with different testcases by gprof. Additionally, this pull request shows what and why profiler works slowly under some cases.

Analysis

There are 4 functions and 4 kinds of hyperparameters, totally 16 combinations in benchmark. As for the detail of methodology and profiling result, please go over in appendix section.

The following sheet is the consuming time of top 5 combinations:

Rank Benchmark / Function pattern Operations Repeats Workload seconds
1 wide_siblings 50000 1 6.167120 s
2 wide_siblings 1000 1000 2.107130 s
3 wide_siblings 10000 5 0.807280 s
4 balanced_tree 1000 1000 0.267998 s
5 deep_chain 1000 1000 0.253346 s

We notice that top 3 slowest functions are wide_siblings(), which looks like:

void wide_siblings()
{
     profile_empty_function1();
     profile_empty_function2();
     // ...
     profile_empty_function{Operations}();
}

It means that callprofiler works slow when profiling a function, which has a large number of direct callees in the same call depth. Especially the top 1 case, it takes 6 second for callprofiler to profile 50000 empty direct callees.

Profiler works slowly in these cases, since the data structure of profiler is trie, and it takes linear time to find target children for a node. The data structure of children list is std::list.

RadixTreeNode<T> * get_child(key_type key) const
{
auto it = std::find_if(m_children.begin(), m_children.end(), [&](const auto & child)
{ return child->key() == key; });
return (it != m_children.end()) ? it->get() : nullptr;
}

using child_list_type = std::list<std::unique_ptr<RadixTreeNode<T>>>;

Every node in trie corresponds to a registered function, and there is a pointer pointing to a node which function is executing. When a registered function calls a registered callee, the profiler would find the corresponding child nodes at first, then move the pointer to the child node. If a registered function returns, the pointer would be moved to its parent.

Therefore, if there are n registered callees, it takes quardratic times to search n children nodes. For example, there are n = 50000 registered callees in the first case. It takes some time to find target children nodes of n callees, since O(n^2) ~= O(50000^2).

In conclusion, if callprofiler needs profiling a function with a lot of direct callees in real life, the search capability of CallProfiler::start_caller() should be optimized furthermore.

Appendix

Benchmark

To simulate the requirements in real life, I design 4 types of functions in benchmarks. In every functions, there are some operations, which are empty functions with different names.

Function types

  1. wide_siblings: All subfunctions are at the same layer.
#define PROFILE_EMPTY(ID)                 \
    void profile_empty_##ID()             \
    {                                     \
        MODMESH_PROFILE_FUNCTION();       \
    }

PROFILE_EMPTY(000000)
PROFILE_EMPTY(000001)
PROFILE_EMPTY(000002)
PROFILE_EMPTY(000003)

void run_wide_siblings()
{
    MODMESH_PROFILE_FUNCTION();

    profile_empty_000000();
    profile_empty_000001();
    profile_empty_000002();
    profile_empty_000003();
}
  1. deep_chain: All subfunctions are at the different layers.
void profile_empty_000003()
{
    MODMESH_PROFILE_FUNCTION();
}

void profile_empty_000002()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000003();
}

void profile_empty_000001()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000002();
}

void profile_empty_000000()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000001();
}

void run_deep_chain()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000000();
}
  1. balanced_tree: The dependency of subfunctions looks like a balanced tree.
void profile_empty_000000() { MODMESH_PROFILE_FUNCTION(); }
void profile_empty_000002() { MODMESH_PROFILE_FUNCTION(); }
void profile_empty_000004() { MODMESH_PROFILE_FUNCTION(); }
void profile_empty_000006() { MODMESH_PROFILE_FUNCTION(); }

void profile_empty_000001()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000000();
    profile_empty_000002();
}

void profile_empty_000005()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000004();
    profile_empty_000006();
}

void profile_empty_000003()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000001();
    profile_empty_000005();
}

void run_balanced_tree()
{
    MODMESH_PROFILE_FUNCTION();
    profile_empty_000003();
}
  1. hot_name_reuse: All subfunctions are at the same layer, but reuse the same name for 100 consecutive subfunctions, then switch to the next name.
#define PROFILE_EMPTY(ID)                 \
    void profile_empty_##ID()             \
    {                                     \
        MODMESH_PROFILE_FUNCTION();       \
    }

PROFILE_EMPTY(000000)
PROFILE_EMPTY(000001)
PROFILE_EMPTY(000002)
PROFILE_EMPTY(000003)

void run_hot_name_reuse()
{
    MODMESH_PROFILE_FUNCTION();

    for (std::size_t repeat = 0; repeat < 100; ++repeat)
    {
        profile_empty_000000();
        profile_empty_000001();
        profile_empty_000002();
        profile_empty_000003();
    }
}

By the way, this pull request sweeps the number of operations in every functions. The number of operations might be 100, 1000, 10000, or 50000. To obtain a precise profiling result, this pull request also repeats and resets the profiler when the number of operations is small, because gprof is sampling-based profiler.

The following data are measured on WSL2, with intel 13700 CPU.
Since gprof is integrated with g++, scripts in this pull request are only supported for linux platform now.

wide_siblings

operations repeats workload seconds
100 10000 2.293520E-01
1000 1000 2.107130E+00
10000 5 8.072800E-01
50000 1 6.167120E+00

gprof top 5: operations 100, repeats 10000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 55.56      0.05     0.05  1010000     0.05     0.05  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 22.22      0.07     0.02    59617     0.34     0.34  modmesh::profiling::detail::profile_empty_049983(unsigned long, unsigned long)
 11.11      0.08     0.01  1000000     0.01     0.06  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
 11.11      0.09     0.01                             modmesh::profiling::detail::profile_empty_000090(unsigned long, unsigned long)
  0.00      0.09     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)

gprof top 5: operations 1000, repeats 1000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 92.43      1.71     1.71  1001000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  2.16      1.75     0.04  1001000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  1.08      1.77     0.02  1001000     0.00     0.00  modmesh::CallProfiler::end_caller()
  1.08      1.79     0.02     8727     0.00     0.00  modmesh::profiling::detail::profile_empty_049983(unsigned long, unsigned long)
  0.54      1.83     0.01                             modmesh::profiling::detail::profile_empty_000495(unsigned long, unsigned long)

gprof top 5: operations 10000, repeats 5

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 96.00      0.72     0.72    50005     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  1.33      0.73     0.01   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  1.33      0.74     0.01    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  1.33      0.75     0.01                             modmesh::profiling::detail::profile_empty_003445(unsigned long, unsigned long)
  0.00      0.75     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()

gprof top 5: operations 50000, repeats 1

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 98.69      6.03     6.03    50001     0.12     0.12  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.33      6.05     0.02    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.16      6.06     0.01    50000     0.00     0.12  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
  0.16      6.07     0.01        1    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
  0.16      6.08     0.01                             modmesh::profiling::detail::profile_empty_011490(unsigned long, unsigned long)

deep_chain

operations repeats workload seconds
100 10000 1.706320E-01
1000 1000 2.533460E-01
10000 5 3.774810E-02
50000 1 6.268040E-02

gprof top 5: operations 100, repeats 10000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 33.33      0.01     0.01  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 33.33      0.02     0.01  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
 33.33      0.03     0.01    10000     0.00     0.00  modmesh::RadixTreeNode<modmesh::CallerProfile>::~RadixTreeNode()
  0.00      0.03     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.03     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()

gprof top 5: operations 1000, repeats 1000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 33.33      0.01     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 33.33      0.02     0.01  1001000     0.01     0.01  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 33.33      0.03     0.01     1000    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
  0.00      0.03     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.03     0.00  1001000     0.00     0.02  modmesh::CallProfiler::end_caller()

gprof top 5: operations 10000, repeats 5

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00    50000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 50000, repeats 1

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 25.00      0.01     0.01    50001     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 25.00      0.02     0.01       13     0.77     0.77  std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_rehash(unsigned long, unsigned long const&)
 25.00      0.03     0.01        1    10.00    10.00  void std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_assign<std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > >(std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> > const&, std::__detail::_ReuseOrAllocNode<std::allocator<std::__detail::_Hash_node<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int>, true> > > const&)
 25.00      0.04     0.01                             modmesh::profiling::detail::profile_empty_043209(unsigned long, unsigned long)
  0.00      0.04     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)

balanced_tree

operations repeats workload seconds
100 10000 1.841660E-01
1000 1000 2.679980E-01
10000 5 3.871620E-02
50000 1 5.947280E-02

gprof top 5: operations 100, repeats 10000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 60.00      0.03     0.03  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 20.00      0.04     0.01  1010000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.05     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.05     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.05     0.00  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 1000, repeats 1000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 83.33      0.05     0.05  1001000     0.05     0.05  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
 16.67      0.06     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.06     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.06     0.00  1001000     0.00     0.05  modmesh::CallProfiler::end_caller()
  0.00      0.06     0.00  1000000     0.00     0.01  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 10000, repeats 5

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
100.00      0.01     0.01                             modmesh::profiling::detail::profile_empty_000026(unsigned long, unsigned long)
  0.00      0.01     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

gprof top 5: operations 50000, repeats 1

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
100.00      0.01     0.01        2     5.00     5.00  modmesh::RadixTreeNode<modmesh::CallerProfile>::~RadixTreeNode()
  0.00      0.01     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

hot_name_reuse

operations repeats workload seconds
100 10000 8.356190E-02
1000 1000 8.568760E-02
10000 5 7.355400E-03
50000 1 2.580690E-02

gprof top 5: operations 100, repeats 10000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00  3030000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00  1010000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00  1010000     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00  1010000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00  1000000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 1000, repeats 1000

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 50.00      0.01     0.01  1001000     0.01     0.01  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
 50.00      0.02     0.01  1000000     0.01     0.02  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)
  0.00      0.02     0.00  3003000     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.02     0.00  1001000     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.02     0.00  1001000     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)

gprof top 5: operations 10000, repeats 5

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  Ts/call  Ts/call  name    
  0.00      0.00     0.00   150015     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.00     0.00    50005     0.00     0.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.00     0.00    50005     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.00     0.00    50000     0.00     0.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

gprof top 5: operations 50000, repeats 1

  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ns/call  ns/call  name    
100.00      0.01     0.01    50001   200.00   200.00  modmesh::CallProfiler::start_caller(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()> const&)
  0.00      0.01     0.00   150003     0.00     0.00  std::_Function_handler<void (), modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)::{lambda()#1}>::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)
  0.00      0.01     0.00    50001     0.00     0.00  modmesh::CallProfiler::end_caller()
  0.00      0.01     0.00    50001     0.00     0.00  std::pair<std::_Rb_tree_iterator<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, bool> std::_Rb_tree<modmesh::RadixTreeNode<modmesh::CallerProfile>*, modmesh::RadixTreeNode<modmesh::CallerProfile>*, std::_Identity<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::less<modmesh::RadixTreeNode<modmesh::CallerProfile>*>, std::allocator<modmesh::RadixTreeNode<modmesh::CallerProfile>*> >::_M_insert_unique<modmesh::RadixTreeNode<modmesh::CallerProfile>*>(modmesh::RadixTreeNode<modmesh::CallerProfile>*&&)
  0.00      0.01     0.00    50000     0.00   200.00  modmesh::CallProfilerProbe::CallProfilerProbe(modmesh::CallProfiler&, char const*)

@ThreeMonth03 ThreeMonth03 marked this pull request as draft May 25, 2026 17:55
@ThreeMonth03 ThreeMonth03 force-pushed the profile-callprofiler-cases branch 2 times, most recently from 22b55de to 65972c4 Compare May 26, 2026 06:03
@ThreeMonth03 ThreeMonth03 marked this pull request as ready for review May 26, 2026 07:05
Copy link
Copy Markdown
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Please take a look. This pull request is quite long, so I'm wondering whether there are better ways to generate code and profile the benchmark on different platform.

Comment on lines +77 to +81
- name: make cprof
if: runner.os == 'Linux'
run: |
make cprof

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Profile profiler only on linux.

return nullptr;
}

bool run_named_case(std::string_view label, std::size_t size, std::size_t repeat_count)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run different types of functions with different hyperparameter.

Comment on lines +75 to +79
std::cout << "RESULT workload=" << label
<< " operations=" << operation_count
<< " repeats=" << repeat_count
<< " workload_seconds=" << elapsed.count()
<< '\n';
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file will print the wall time of benchmark, because we cannot obtain wall time from gprof

Comment on lines +43 to +56
void configure_large_stack()
{
#if defined(__linux__)
rlimit limit{};
if (getrlimit(RLIMIT_STACK, &limit) == 0)
{
if (RLIM_INFINITY == limit.rlim_max || limit.rlim_cur < limit.rlim_max)
{
limit.rlim_cur = limit.rlim_max;
static_cast<void>(setrlimit(RLIMIT_STACK, &limit));
}
}
#endif
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Configure enough stack size at first, because the depth of callers may be 50000.

Comment on lines +36 to +41
std::array<case_definition, 4> const case_definitions{{
{"wide_siblings", &workload::run_wide_siblings},
{"deep_chain", &workload::run_deep_chain},
{"balanced_tree", &workload::run_balanced_tree},
{"hot_name_reuse", &workload::run_hot_name_reuse},
}};
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 kinds of benchmark. They are generated by python scripts.

Comment on lines +18 to +25
add_custom_command(
OUTPUT ${CPROF_GENERATED_SOURCES}
COMMAND "${PYTHON_EXECUTABLE}" "${CPROF_GENERATOR}"
--output-dir "${CPROF_GENERATED_DIR}"
--shards "${CPROF_SHARD_COUNT}"
DEPENDS "${CPROF_GENERATOR}"
VERBATIM
)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generate benchmarks.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to generate cpp files with macro, but it is too slow to generate 50000 functions.

Comment thread profiling/cprof/run.py


if __name__ == "__main__":
main()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scripts to run the execute file of profiling/cprof/callprofiler_gprof.cpp.

Comment thread CMakeLists.txt
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure whether to put cpp files in /profiling.

@ThreeMonth03 ThreeMonth03 force-pushed the profile-callprofiler-cases branch from 65972c4 to 45dc6fb Compare May 26, 2026 08:07
Copy link
Copy Markdown
Member

@yungyuc yungyuc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please describe what you wanted to see and summarize what you found in each of the 4 profiling cases in the opening comment. It is not obvious what you want to say with the plain profiling data.

To profile the CallProfiler is a good idea.

@yungyuc yungyuc added the performance Profiling, runtime, and memory consumption label May 26, 2026
@ThreeMonth03 ThreeMonth03 requested a review from yungyuc May 27, 2026 18:02
Copy link
Copy Markdown
Collaborator Author

@ThreeMonth03 ThreeMonth03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yungyuc Please take a look. I think I've pointed out why we need to optimize some hotspots now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Profiling, runtime, and memory consumption

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants